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Abstract 

Counting is a fundamental operation. For exam- 
ple, counting the ath frequency moment, = 

^2i—i At [i] a , of a streaming signal A t (where t de- 
notes time), has been an active area of research, in 
theoretical computer science, databases, and data 
mining. When a = 1, the task (i.e., counting the 
sum) can be accomplished using a counter. When 
a^l, however, it becomes non-trivial to design a 
small space (i.e., low memory) counting system. 

Compressed Counting (CC) is proposed for effi- 
ciently computing the ath frequency moment of a 
data stream A t , where < a < 2. CC is ap- 
plicable if the streaming data follow the Turnstile 
model, with the restriction that at the time t for the 
evaluation, A t [i] > 0, Vz £ [1,-D], which includes 
the strict Turnstile model as a special case. For 
data streams in practice, this restriction is minor. 

The underlying technique is skewed stable random 
projections, which captures the intuition that, when 
a = 1 a simple counter suffices, and when a = 
1 ± A with small A, the sample complexity should 
be low (continuously as a function of A). We show 
the sample complexity (number of projections) k = 
G^z log (|), where G = O (e) as A -> 0. In other 
words, for small A, k = O (1/e) instead of O (l/e 2 ) . 

The case A — > is practically very important. It is 
now well-understood that one can obtain good ap- 
proximations to the entropies of data streams using 
the ath moments with a = 1 ± A and very small 
A. For statistical inference using the method of 
moments, it is sometimes reasonable use the ath 
moments with a very close to 1 . As another exam- 
ple, A might be the "decay rate" or "interest rate," 
which is usually small. Thus, Compressed Count- 
ing will be an ideal tool, for estimating the total 
value in the future, taking in account the effect of 
decaying or interest accruement. 

Finally, our another contribution is an algorithm 
for approximating the logarithmic norm, Ylf=i 1°S At [i], 
and the logarithmic distance, J2iLi 1°S \At [i] — B t [i] | . 
The logarithmic norm arises in statistical estima- 
tions. The logarithmic distance is useful in ma- 
chine learning practice with heavy-tailed data. 

1 Introduction 

This paper focuses on counting, which is among the most 
fundamental operations in almost every field of science and 
engineering. Computing the sum YlfLi At[i] is the simplest 
counting (t denotes time). Counting the ath moment J2-=i At [i] 



is more general. When a —> 0+, Ym=i Ai[i] a counts the total 
number of non-zeros in A t . When a = 2, J2f=i At[i] a counts 
the "energy" or "power" of the signal A t . If A t actually out- 
puts the power of an underlying signal B t , counting the sum 
YliLi B t is equivalent to computing Ym=\ A t [i] 112 . 

Here, A t denotes a time-varying signal, for example, data 
streamsM&tM^M^UJl- In the literature, the ath frequency 
moment of a data stream A t is defined as 
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Counting F/ a \ for massive data streams is practically im- 
portant, among many challenging issues in data stream com- 
putations. In fact, the general theme of "scaling up for high 
dimensional data and high speed data streams" is among the 
"ten challenging problems in data mining research." 

Because the elements, A t [i], are time-varying, a naive 
counting mechanism requires a system of D counters to com- 
pute Ff a \ exactly. This is not always realistic when D is 
large and we only need an approximate answer. For exam- 
ple, D may be 2 64 if A t records the arrivals of IP addresses. 
Or, D can be the total number of checking/savings accounts. 

Compressed Counting (CC) is a new scheme for approx- 
imating the ath frequency moments of data streams (where 
< a < 2) using low memory. The underlying technique is 
based on what we call skewed stable random projections. 

1.1 The Data Models 

We consider the popular Turnstile data stream model (TTJ. 
The input stream at = (i t ,I t ), it € [1, D] arriving sequen- 
tially describes the underlying signal A, meaning A t [it] — 
A t -\ [it] + h- The increment I t can be either positive (inser- 
tion) or negative (deletion). Restricting It > results in the 
cash register model. Restricting A t [i] > at all t (but I t can 
still be either positive or negative) results in the strict Turn- 
stile model, which suffices for describing most (although not 
all) natural phenomena. For example lfTTl . in a database, a 
record can only be deleted if it was previously inserted. An- 
other example is the checking/savings account, which allows 
deposits/withdrawals but generally does not allow overdraft. 

Compressed Counting (CC) is applicable when, at the 
time t for the evaluation, A t [i] > for all i. This is more 
flexible than the strict Turnstile model, which requires A t [i] > 
at all t. In other words, CC is applicable when data streams 
are (a) insertion only (i.e., the cash register model), or (b) 
always non-negative (i.e., the strict Turnstile model), or (c) 
non-negative at check points. We believe our model suffices 
for describing most natural data streams in practice. 

With the realistic restriction that A t [i] > at t, the defi- 
nition of the ath frequency moment becomes 
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and the case a = 1 becomes trivial, because 

D t 

F w = j2 At M = J2 I ° 0> 

i=l a=l 

In other words, for i*(i), we need only a simple counter to 
accumulate all values of increment/decrement I t . 

For a ^ 1, however, counting (ffjl is still a non-trivial 
problem. Intuitively, there should exist an intelligent count- 
ing system that performs almost like a simple counter when 
a = 1 ± A with small A. The parameter A may bear a clear 
physical meaning. For example, A may be the "decay rate" 
or "interest rate," which is usually small. 

The proposed Compressed Counting ( CC) provides such 
an intelligent counting systems. Because its underlying tech- 
nique is based on skewed stable random projections, we pro- 
vide a brief introduction to skewed stable distributions. 

1.2 Skewed Stable Distributions 

A random variable Z follows a /3-skewed a-stable distribu- 
tion if the Fourier transform of its density is (2D 

,9 z (t) = Eexp (y^-iZt) a 1, 

= exp [-F\t\ a (l - y=T/3sign(t) tan (^))) , 

where — 1 < j3 < 1 and F > is the scale parameter. We 
denote Z ~ S(a,/3,F). Here < a < 2. When a < 0, 
the inverse Fourier transform is unbounded; and when a > 
2, the inverse Fourier transform is not a probability density. 
This is why Compressed Counting is limited to < a < 2. 

Consider two independent variables, Z\,Z 2 ~ S^a, /3, 1). 
For any non-negative constants C\ and C%, the "a-stability" 
follows from properties of Fourier transforms: 

Z = C X Z X + C 2 Z 2 ~ S 1 (a, /3, C? + 6*2°) . 

However, if C\ and C2 do not have the same signs, the above 
"stability" does not hold (unless (3 — or a — 2, 0+). To 
see this, we consider Z = G\Z\ — C^Z^, with C\ > and 
C 2 > 0. Then, because ,^z 2 (t) = &z 2 (-*), 

&z = exp (-|Cit| a (l - v^T^sign^tan (^))) 

x exp (-|C 2 tr (l + ^l/3sign(f)tan(^))) , 

which does not represent a stable law, unless j3 — or a = 2, 
0+. This is the fundamental reason why Compressed Count- 
ing needs the restriction that at the time of evaluation, ele- 
ments in the data streams should have the same signs. 

1.3 Skewed Stable Random Projections 

Given R S R D with each element j-j ~ S(a, (3, 1) i.i.d., then 

R T A t = J2nA t [i\~ s(a,/3,F (a) =J2 At ^j > 

i=i V 1=1 / 

meaning i? T yl t represents one sample of the stable distribu- 
tion whose scale parameter F^ a ) is what we are after. 

Of course, we need more than one sample to estimate 
We can generate a matrix R £ lR l; ' xfe with each entry 
r tJ ~ S(a,/3,1). The resultant vector X = R T A f e K fc 
contains fc i.i.d. samples: xj ~ S* (a, /3, J = 1 to fc. 

Note that this is a linear projection; and recall that the 
Turnstile model is also linear. Thus, skewed stable random 



projections can be applicable to dynamic data streams. For 
every incoming at = (it, It), we update Xj *— Xj + ri t jl t 
for j = 1 to k. This way, at any time t, we maintain k i.i.d. 
stable samples. The remaining task is to recover F^, which 
is a statistical estimation problem. 

1.4 Counting in Statistical/Learning Applications 

The method of moments is often convenient and popular in 
statistical parameter estimation. Consider, for example, the 
three-parameter generalized gamma distribution GG(9,^j,rj), 
which is highly flexible for modeling positive data, e.g., fl5l . 
If X ~ GG(9, 7, 77), then the first three moments are E(X) = 
6» 7 , Var(AT 2 ) = 6» 7 2 , E (X - E(X)) 3 = (77 + 1)6>7 3 . Thus, 
one can estimate 9, 7 and 77 from D i.i.d. samples Xi ~ 
GG(9, 7, rj) by counting the first three empirical moments 
from the data. However, some moments may be (much) eas- 
ier to compute than others if the data Xj's are collected from 
data streams. Instead of using integer moments, the param- 
eters can also be estimated from any three fractional mo- 
ments, i.e., 53i=i x ?> for three different values of a. Because 
D is very large, any consistent estimator is likely to provide 
a good estimate. Thus, it might be reasonable to choose a 
mainly based on the computational cost. See Appendix [A] 
for comments on the situation in which one may also care 
about the relative accuracy caused by different choices of a. 

The logarithmic norm YliLi 1°6 x i ar i ses m statistical es- 
timation, for example, the maximum likelihood estimators 
for the Pareto and gamma distributions. Since it is closely 
connected to the moment problem, Section|4]provides an al- 
gorithm for approximating the logarithmic norm, as well as 
for the logarithmic distance; the latter can be quite useful 
in machine learning practice with massive heavy-tailed data 
(either dynamic or static) in lieu of the usual l 2 distance. 

Entropy is also an important summary statistic. Recently 

ll20l proposed to approximate the entropy moment J2iLi x i 1°S x i 
using the ath moments with a = 1 ± A and very small A. 

1.5 Comparisons with Previous Studies 

Pioneered by f 1 ] , there have been many studies on approxi- 
mating the ath frequency moment F( a y [l \ considered in- 
teger moments, a = 0, 1, 2, as well as a > 2. Soon after, 
(5] [9) provided improved algorithms for < a < 2. |fl8l [3] 
proved the sample complexity lower bounds for a > 2. |[T9l 
proved the optimal lower bounds for all frequency moments, 
except for a = 1, because for non-negative data, F(p can 
be computed essentially error- free with a counter lfT6l l6l [D . 
IfTTI provided algorithms for a > 2 to (essentially) achieve 
the lower bounds proved in |[T8ll3l. 

Note that an algorithm, which "achieves the optimal bound," 
is not necessarily practical because the constant may be very 
large. In a sense, the method based on symmetric stable ran- 
dom projections[10] is one of the few successful algorithms 
that are simple and free of large constants. [ 1 1 described the 
procedure for approximating F^ in data streams and proved 
the bound for a = 1 (although not explicitly). For a / 1, 
IflOl provided a conceptual algorithm. [14| proposed vari- 
ous estimators for symmetric stable random projections and 
provided the constants explicitly for all < a < 2. 

None of the previous studies, however, captures of the 
intuition that, when a = 1, a simple counter suffices for 



computing Fn) (essentially) error-free, and when a = 1± A 
with small A, the sample complexity (number of projections, 
k) should be low and vary continuously as a function of A. 

Compressed Counting (CC) is proposed for < a < 2 
and it works particularly well when a = 1 ± A with small A. 
This can be practically very useful. For example, A may be 
the "decay rate" or the "interest rate," which is usually small; 
thus CC can count the total value in the future taking into 
account the effect of decaying or interest accruement. In pa- 
rameter estimations using the method of moments, one may 
choose the ath moments with a close 1. Also, one can ap- 
proximate the entropy moment using the ath moments with 
a = 1 ± A and very small A ||20l . 

Our study has connections to the Johnson-Lindenstrauss 
Lemma Ifl2l . which proved k — O (l/e 2 ) a t a = 2. An anal- 
ogous bound holds for < a < 2 ||TUl[T4l . The dependency 
on l/e 2 may raise concerns if, say, e < 0.1. We will show 
that CC achieves k = 0(1/ e) in the neighborhood of a = 1. 

1.6 Two Statistical Estimators 

Recall that Compressed Counting (CC) boils down to a sta- 
tistical estimation problem. That is, given k i.i.d. samples 
xj ~ S (a, (3=1, estimate the scale parameter Fr a y 

Section [2] will explain why we fix (3 = 1. 

Part of this paper is to provide estimators which are con- 
venient for theoretical analysis, e.g., tail bounds. We pro- 
vide the geometric mean and the harmonic mean estimators, 
whose asymptotic variances are illustrated in Figure Q] 



— Geometric mean 
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Figure 1 : Let F be an estimator of F with asymptotic vari- 
ance Var (f) = V^- + O (pr) . We plot the V values for the 

geometric mean and the harmonic mean estimators, along 
with the V values for the geometric mean estimator in lfT4l 
(symmetric GM). When a — ► 1, our method achieves an "in- 
finite improvement" in terms of the asymptotic variances. 
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It is considerably more accurate than Ft a \ gm and its 
sample complexity bound is also provided in an explicit 
form. Here T(.) is the usual gamma function. 

1.7 Paper Organization 

Section|2]begins with analyzing the moments of skewed sta- 
ble distributions, from which the geometric mean and har- 
monic mean estimators are derived. Sectionals then devoted 
to the detailed analysis of the geometric mean estimator. 

Section[3]analyzes the harmonic mean estimator. Section 
|4]addresses the application of CC in statistical parameter es- 
timation and an algorithm for approximating the logarithmic 
norm and distance. The proofs are presented as appendices. 

2 The Geometric Mean Estimator 

We first prove a fundamental result about the moments of 
skewed stable distributions. 

Lemma 1 If Z ~ S(a, (3, Fr a \ ), then for any — 1 < A < a, 
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which can be simplified when (3 = 1, to be 
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For a < 1, and —oo < A < a, 
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Proof: See Appendix\B\ □ 



The geometric mean estimator, F( Q ) 
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(D grn depends on a and k). 



(F( Q ,, 9m ) = %^(« 2 +2-3 K 2 (a))+ofl 
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' 1 " ' k 12 v ~ ' " "'" ' " \k- 

n(a) — a, if a < 1, ^(a) — 2 — a, if a > 1 



F(a),gm i s unbiased. We prove the sample complexity 
explicitly and show k — O (l/e) suffices for a around 1. 



The harmonic mean estimator, F( a y hm _ c , for a < 1 



Recall that Compressed Counting boils down to estimat- 
ing F( Q ) from these k i.i.d. samples Xj ~ S(a, f3, F( a )). 
Setting A = % m Lemma [T] yields an unbiased estimator: 
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D gm ,f3 = cos k ( i tan ^/Jtan^^^jx 
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The following Lemma shows that the variance of F^ a ^ gm j3 
decreases with increasing (3 £ [0, 1]. 



Lemma 2 The variance of Ft a ),gm,p 

cos fc (| tan" 1 (/3tan(^))) 



V, 



gm,f3 — 



cos 2fc (| tan" 1 (/3tan(^))) 

[!^(f)r(i-l)r(D] 
[f^(if)r(i-i)r(f)] 5 



is a decreasing function of (3 € [0, 1]. 
Proof: The result follows from the fact that 

cos (ftan^ 1 (/Stan {^f))) 



cos 2 (| tan- 1 (/3tan(^))) 

=2 — sec 2 f i tan^ 1 ^/3tan (~^~))^ i 
/s a deceasing function of f3 £ [0, 1]. □ 

Therefore, for attaining the smallest variance, we take 
j3 = 1. For brevity, we simply use F( a y gm instead of Fr a ),gm,i- 
In fact, the rest of the paper will always consider /3 = 1 only. 

We rewrite F^ <gm (i.e., F (a ^ gm ^ =1 ) as 
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k(q)7T 
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Here, k(cv) = a, if a < 1, and k(cc) = 2 — a if a > 1. 
Lemma[3]concerns the asymptotic moments of F^ a ) gm . 

Lemma 3 As k — > oo 

fi;(a)7r 



7r a\ 
2 k) 



2k 

->exp(-7 e (a-l)), (5) 

monotonically with increasing k (k > 2 J, w/zere 7 e = 
0.57724... is Euler's constant. For any fixedt, as k — > oo, 



=F* a) 
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P2 2 
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* 12- +2- 3K 2 (a)) + f_ 



Proof: See Appendix\C\ □ 

In (0), the denominator D gm depends on k for small k. 
For convenience in analyzing tail bounds, we consider an 
asymptotically equivalent geometric mean estimator: 



F{ a ),gm,b = exp (-y e (a ~ 1)) COS 



k(q)7T 



3=1 



Lemma|4]provides the tail bounds for F( a ),gm,b an d Fig- 
ure |2] plots the tail bound constants. One can infer the tail 
bounds for F^ gm from the monotonicity result (f5]l- 



Lemma 4 The right tail bound: 
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< e < 1, 



Pr (F (a)iSmii) - F (a) > eF (a) ) < exp y-k-^r ) ' u 

ant/ the left tail bound: 

Pr (F( a)i9m , 6 - F (a) < -eF( a) ) < exp ( 

e 2 

= Cfllog(l +e) - C* fl7e (a - 1) 

(jR,gm 

( ( k{o)-kCr\ 2 f iraC R 

- log I cos I 2 J ~ r ( aC t) r I 1 - C «) sm ( — — 

= -C L log(l - e) + C*L7 e (a - 1) + log a 



G 



L,gm 

- log (cos (^Cfe) r (Cfc)) + log (r (aCt) cos 

Cr ant/ Cl are solutions to 

- T.(a - 1) + log(l + e ) + ^tan (^C fl 

fvfaW / k(q)7T 

V L I V L / 



= 0. 



log(l - e) - 7e(a - 1) — tan I — 2 — L 

+ ^tan (^C*l) - V («Cl) q + ip (C L ) 

Here ip(z) — j^y is the "Psi" function. 
Proof: See Appendix\D\ □ 
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(c) Left bound, a < 1 (d) Left bound, a > 1 

Figure 2: The tail bound constants of F^y^ t, in Lemma|4] 

It is important to understand the behavior of the tail bounds 
as a = 1 ± A -> 1. (a = 1 - A if a < 1; and a = 1 + A 
if a > 1.) See more comments in Appendix lAl Lemma [5] 
describes the precise rates of convergence. 



Lemma 5 For fixed e, as a — > 1 (i.e., A — > 0), 
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Proof: See Appendix^ □ 
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Figure 3: The tail bound constants proved in Lemma [4] and 
the approximations in Lemma|5] for small A. 

Figure[3]plots the constants for small values of A, along 
with the approximations suggested in Lemma [5] Since we 
usually consider e should not be too large, we can write, as 

a -> 1, G B ., g ,n = O (e) and G L:gm = O (e) if a > 1; 
both at the rate O (</A\. However, if a < 1, G*L, 9m = 
O (eexp (—■§)), which is extremely fast. 

The sample complexity bound is then straightforward. 

Lemma 6 Using the geometric mean estimator, it suffices to 
let k — log (|) so that the error will be within a 1 ± e 
factorwith probability 1—5, where G = max(Gfl, gm , Gi, gm ). 
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3 The Harmonic Mean Estimator 

For a < 1, the harmonic mean estimator can considerably 
improve F/ a \ gm . Unlike the harmonic mean estimator in 
lfl4l . which is useful only for small a and has no exponential 
tail bounds except for a = 0+, the harmonic mean estimator 
in this study has very nice tail properties for all < a < 1. 

The harmonic mean estimator takes advantage of the fact 
that if Z ~ S(a <1,(3= 1, F (a) ), then E (\Z\ X ) = E(Z X ) 
exists for all — oo < A < a. 



Lemma 7 Assume k Ltd. samples Xj ~ S(a < 1,0 
1, Fr a \), define the harmonic mean estimator Fi a ),hm> 
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and the bias-corrected harmonic mean estimator Ft 
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Proof: See Appendix\F\ □. 
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Figure 4: The tail bound constants of i 7 '( Q; )./ lm in Lemma|7] 
which are considerably smaller, compared to Figure[2fa)(c). 



4 The Logarithmic Norm and Distance 

The logarithmic norm and distance can be important in prac- 
tice. Consider estimating the parameters from D i.i.d. sam- 
ples Xi ~ Gamma(9,'y). The density function is fx{x) — 
x 0-i CX ^~^J^ , and the likelihood equation is 

D D 

(6 - 1) log x t - J2 **h - D6 lo s(->) - D lo § r W ■ 

i—l i=l 

If instead, xi ~ Pareto(9), i = 1 to D, then the density is 
fx{x) — -^btt, x > 1, and the likelihood equation is 

D 

D log 0- (9 + 1)^2 log xi. 

Therefore, the logarithmic norm occurs at least in the 
content of maximum likelihood estimations of common dis- 
tributions. Now, consider the data Xi's are actually the el- 
ements of data streams yl t [i]'s. Estimating l°g^t[*] 
becomes an interesting and practically meaningful problem. 

Our solution is based on the fact that, as a — > 0+, 

\ i=l / i=l 

which can be shown by L'Hopital's rule. More precisely, 

^log(if>[^-f>gA t [i] 

\ i=i / i=i 




i \i 



which can be shown by Taylor expansions. 

Therefore, we obtain one solution to approximating the 
logarithmic norm using very small a. Of course, we have 
assumed that A t [i] > strictly. In fact, this also suggests 
an approach for approximating the logarithmic distance be- 
tween two streams YliLi 1°S \At [i] — B t [i] |, provided we use 
symmetric stable random projections. 

The logarithmic distance can be useful in machine learn- 
ing practice with massive heavy-tailed data (either static or 
dynamic) such as image and text data. For those data, the 
usual 1-2 distance would not be useful without "term-weighting'' 
the data; and taking logarithm is one simple weighting scheme. 
Thus, our method provides a direct way to compute pairwise 
distances, taking into account data weighting automatically. 

One may be also interested in the tail bounds, which, 
however, can not be expressed in terms of the logarithmic 
norm (or distance). Nevertheless, we can obtain, e.g., 



Pr 



D 



log 



D 



:F 



(a) , hm 



> + 



< exp ( -7. 
Pr 



((F {a) / D y-iy 



Gr, 



h m 



— log ( JjF( a ) :hm 



< (1 - e) 



< exp I — k 



(D/F (a) 



GL.hr) 



f log(^ M 



e > 0, 
D, ( \ 

< e < 1 



If F(a). gm is used, we just replace the corresponding con- 
stants in the above expressions. If we are interested in the 
logarithmic distance, we simply apply symmetric stable ran- 
dom projections and use an appropriate estimator of the dis- 
tance; the corresponding tail bounds will have same format. 

5 Conclusion 

Counting is a fundamental operation. In data streams A t [i], 
i G counting the ath frequency moments = 

Efc=i^t['] Q nas been extensively studied. Our proposed 
Compressed Counting (CC) takes advantage of the fact that 
most data streams encountered in practice are non-negative, 
although they are subject to deletion and insertion. In fact, 
CC only requires that at the time t for the evaluation, A t [i] > 
0; at other times, the data streams can actually go below zero. 

Compressed Counting successfully captures the intuition 
that, when a = 1, a simple counter suffices, and when a = 
1 ± A with small A, an intelligent counting system should 
require low space (continuously as a function of A). The 
case with small A can be practically important. For exam- 
ple, A may be the "decay rate" or "interest rate," which is 
usually small. CC can also be very useful for statistical pa- 
rameter estimation based on the method of moments. Also, 
one can approximate the entropy moment using the ath mo- 
ments with a = 1 ± A and very small A. 

Compared with previous studies, e.g., iflOl [141 . Com- 
pressed Counting achieves, in a sense, an "infinite improve- 
ment" in terms of the asymptotic variances when A — > 0. 
Two estimators based on the geometric mean and the har- 
monic mean are provided in this study, including their vari- 
ances, tail bounds, and sample complexity bounds. 

We analyze our sample complexity bound k 
at the neighborhood of a = 1 and show G = 0(e) at 
small A. This implies that our bound at small A is actu- 
ally k — O (1/e) instead of O (l/e 2 ), which is required in 
the Johnson-Lindenstrauss Lemma and its various analogs. 

Finally, we propose a scheme for approximating the log- 
arithmic norm and the logarithmic distance, useful in statis- 
tical parameter estimation and machine learning practice. 

We expect that new algorithms will soon be developed 
to take advantage of Compressed Counting. For example, 
via private communications, we have learned that a group is 
vigorously developing algorithms using projections with a = 
1 ± A very close to 1 , where A is their important parameter. 

A An Example of Method of Moments 

We provide a (somewhat contrived) example of the method 
of moments. Suppose the observed data Xi's are from data 
streams and suppose the data follows a gamma distribution 
Xi ~ Gamma(9, 1), i.i.d. Here, we only consider one pa- 
rameter 9 so that we can analyze the variance easily. 

Suppose we estimate 9 using the ath moment. Because 
E(xf ) = T(a + 6)/T(6), we can solve for 9 from 



G^logf 



6) i ^ 



T(a + ( 



r(2a + e) r 2 (a + ( 



r(e) D fr{ "" V r W J D \ r W r2 W 
By the "delta method" (i.e., Var(/i(x)) w Vai(x)(h' (E(x))) 2 ) 



and using the implicit derivative of 9, we obtain 



Var 



1 /r(2a + 0)r(0) 



D 



r 2 (« + ( 



One can verify Var(#) increases monotonically with in- 
creasing a e [0,oo). Because Xi's are from data streams, 
we apply Compressed Counting for the ath moment. Sup- 
pose we consider the difference in the estimation accuracy at 
different a is not important (because D is large). Then we 
simply let a = 1. In case we need to estimate two parame- 
ters, we might choose a = 1 and another a close to 1 . 

Now suppose we actually care about both the estimation 
accuracy (which favors smaller a) and the computational ef- 
ficiency (which favors a = 1), we then need to balance this 
trade-off by choosing a. To do so, we need to know the pre- 
cise behavior of Compressed Counting in the neighborhood 
of a — 1, as well as the precise behavior of 6, i.e., its tail 
bounds (not just variance). Thus, our analysis on the conver- 
gence rates in Lemma|5]will be very useful. 

B Proof of Lemma Q] 

Assume Z ~ S(a,f3, F (a) ). To prove E (|Z| A ) for -1 < 
A < a, lETl Theorem 2.6.3] provided only a partial answer: 



r. 



z f z (z;a, B ,F( a ))dz 



Va sin(7rpA) r(l- |) _ A/ 
^ ' sin(7rA) r (1 — AJ 



where we denote 



n(a) — a if a < 1, and n(a) — 2 — a if a > 1, 



and according to the parametrization used in [21 , 1.19, 1.28]: 



Pb = 



2 _j / /Tra 

tan p tan 

7TK(a) V V 2 



1 — 0BK(a)/ot 



Note that 



-V' 



(7r/3 B K(a)/2) = (l + tan 2 (w/3b k(<*)/2)) ' 



1 + tan ( tan ( p tan [ 



Therefore, for — 1 < A < a, 

f°° z x f z (z;a,f3 B ,F {a) )dz 
Jo 

_ V „ sin( ff pA)r(l- A) / 
(°0 sin(TrA) T(l-A) V 



To compute E ( | Z | A ) , we take advantage of a useful prop- 
erty of the stable density functional page 65]: 

fz(-z;a,f3 B ;F (a) ) = f z (z;a,-f3 B ,F (a) ). 



e(\z\ x ) 

= y° (-z) x f z (z;a,f3 B ,F {a) )dz+ z x f z (z;aJ] B ,F (a) )dz 

= f°° z x f z (z;a,-p B ,F (a) )dz+ / z x fz(z;a,0 B ,F ia) )dz 
Jo Jo 

= ^ r(i-|w /^xA 

sin(TrA) r (1 — A) \ \ 2 J J 
X ( sin ( n-A _ — | + sin 7rA 



sin(TrA) r (1 — A) \ V 2 

/ttA\ /»! , w 
2 sin I J cos I — — /3 B ^(a)/t 

= — - SLZ. Il + fi 2 tan 2 ( 11 

cos(ttA/2) r (1 — A) \ V 2 



cos — tan p tan 

V" V V 2 jjj 



— -f 7 / \ 1 + P tan ] cos — tan p tan 

<=> v V 2 V" V V 2 
x (J ,d,l (i A ) r ( 1 -^) r(A) 

which can be simplified when f3 = 1, to be 



cos(^4?) , 2 /,r \ / A , 

■ sin -A r 1 T(A) 



The final task is to show that when a < 1 and j3 = 1, 
E (|Z| A ) exists for all — oo < A < a, not just — 1 < A < a. 
This is an extremely useful property. 

Note that when a < 1 and (3 = 1, Z is always non- 
negative. As shown in the proof of [21 Theorem 2.6.3], 



E ( \Z\ X ) = F A/ ° cos~ A/o 



7ra \ 1 



t Jo Jo 



cxp ( — zzt cxp( V— 1tt/2) — u a cxpf — V — Ittq/2) H | dudz 



7ra \ 1 
2 



Im 



■to Jo 



z exp (— zuyj — 1 — u a exp(— V— l7ra/2)) V — ldudz. 



The only thing we need to check is that in the proof of ||2T1 
Theorem 2.6.3], the condition for Fubini's theorem (to ex- 
change order of integration) still holds when — oo < a < 1, 
(3 = 1, and A < —1. We can show 

/ / \z X cxp — u a cxp(— \/— l7ra/2)) v^— T dudz 

Jo Jo ' ' 

— / z X |cxp (— u Q cos(7ra/2) + — lu a sin(7ra/2)) | dudz 

Jo Jo 

/ z cxp (— u a cos(7ra/2)) dudz < oo, 

provided A < -1 (A ^ -1, -2, -3, ....) and cos(7ra/2) > 
0, i.e., a < 1. Note that | exp(\/— lcc) | = 1 always and 
Euler's formula: exp(V— lx) = cos(a;) + yJ— \ sin(x) is 
frequently used to simplify the algebra. 

Once we show that Fubini's condition is satisfied, we can 
exchange the order of integration and the rest follows from 
the proof of ||2~T1 Theorem 2.6.3]. Because of continuity, the 
"singularity points" A = — 1, —2, —3, ... do not matter. 



C Proof of Lemma |3] 

We first show that, for any fixed t, as k — ■> oo, 

E((F( Q ,. sm ) t ) 

=pt cos fc (^t) [| sin ( ift )r(i-±)r(ft)]' 
F(Q) cos- (^) [i sin (if) r (i - i) r (f )] fct 

=F <°> oxp (i (" 2 + 2 - 3k2(q) ) + ° ' 

In lfl4l . it was proved that, as k — > oo, 

[| s in(^ t )r(i-|)r(ft)] fc 
[|si„( if )r(i-i)r(t)]'= t 

1 7T 2 (t 2 - t) / 2 \ / 1 

=H ( Of + 2 ) + O — 

ft 24 V / I*: 2 



fr 2 



: CXp 



1 7T 2 (t 2 - t) / , \ 

(a +2) +0 , 

& 24 V ) \k 2 



Using the infinite product representation of cosine[7 1 1 .43.3] 



= n f 1 



(2S + 1) 2 7T 2 



we can rewrite 

cos " (Mg^t) 
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=n 

s=0 

- cxp 

- exp 

- cxp 



K 2 (a) 



K 2 (a) 



(2s + l) 2 fc 2 
K 2 (a)(t 2 - t) 



1 + t 



(2s + l) 2 fe 2 
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-S- / K 2 (a)(t 2 - t) 
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K 2 (a)(t 2 - t) ( 1 
1 + O [ — 

(2s + l) 2 fe V*: 2 



ft 2 (a) , o , 1 / 1 
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f^o ( 2s + !) 
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A-' 2 



k ( < -* ) T + °v fe 2 



which, combined with the result in iTPfl . yields the desired 
expression. 



The next task is to show 

2k J -k \k J V fcy \2fc 



- exp (-7 e (a - 1)) , 



monotonically as k — > oo, where 7 e = 0.577215665..., is 
Euler's constant. In fl4l . it was proved that, as k — > oo, 

l T (!) r ( l -\) sin (I f ) ] ^ cxp (q - 1)} ■ 

monotonically. In this study, we need to consider instead 



2k J tt \ k J V fc / V 2 fc 



«,(a)ir\ r(f)sin(ff) 



(6) 



2fe J r(i)si„(f) 

(Note Euler's reflection formula T(z)r(l — z) = sin ^ rz - ) .) 



The additional term [cos 

ti(a)7r\ 2 fa 



2k y^Vfc/v 1 fc/ sm V2fc 



= l- o Therefore, 

-» exp (--y e (a - 1)) . 



To show the monotonicity, however, we have to use some 
different techniques from [14]. The reason is because the ad- 



ditional term 



cos 



k(ck) 7r 
2k 



increases (instead of decreas- 



ing) monotonically with increasing k. 

First, we consider a > 1, i.e., k(o) = 2 — a < 1. For 
simplicity, we take logarithm of (O and replace 1/k by t, 
where < t < 1/2 (recall k > 2). It suffices to show that 
g(t) increases with increasing t G [0, 1/2], where 

g(t) = -W{t), 

W(t) = log (cos l-±-J-t \ 1 + log (r (at)) + log (sin ( — t 
- log (r (t)) - log (sin (tt*)) + log(2). 

Because = \W(t) - ^W{t), to show g'(t) > in 
t € [0, 1/2], it suffices to show 

tW'(t) - W(t) > 0. 

One can check that tW'(t) ->■ and W(t) -> 0, as t -> 0+. 



w'(t) 



'ft(a)77 
tan c 



+ (at) a + 



a7r 

tan(^-t) V 2 



tan(7rt) 



Here ^(a;) = dl ° s ^ x ^ j s the "Psi" function. Therefore, to 
show tW'(t) - W(t) > 0, it suffices to show that tW'(t) - 
W(t) is an increasing function of t € [0, 1/2], i.e., 



7 J J (tW'(t) - W(t))' = W"(t) > 0, i.e., 

W"(t) = - sec 2 v ' t - ' + 4>'(at)a 



2 / V 2 

7TQ \ / TZCt \ 2 , . 9 . , 2 



Using series representation of ip(x) Q 8.363.8], we show 



■4>' (q*)q 2 - v'(t) 



( Q t + s) 2 (t + s y 
l l 



-12 



t' \(t + s/a) 2 (t + s) 2 ;- 

because we consider a > 1. Thus, it suffices to show that 



Q(t; a) = - sec 2 ( V t W V . 



2 

2/ira 



+ CSC (7Tt)7T - > 0. 



To show a) > 0, we can treat Q(t; a) as a function of 



a (for fixed i). Because both 



and 



s(.r) 



are convex 



functions of a; G [0,7r/2], we know Q(f;a) is a concave 
function of a (for fixed t). It is easy to check that 



lim Q(t: a) = 0, 



lim Q(t; a) = 0. 



Because Q(t; a) is concave in a £ [1, 2], we must have 
Q(t;a) > 0; and consequently, W"(t) > and g'(t) > 0. 
Therefore, we have proved that © decreases monotonically 
with increasing k, when 1 < a < 2. 

For a < 1 (i.e., k(ck) = a < 1), we prove the mono- 
tonicity by a different technique. First, using infinite-product 



representations 8.322,1.431.1], 

r( .) = ° p ^ n( 1 + . 



cxp 



We need to show that is a convex function. By the 
following expansions: Q 1.422.2, 1.422.4, 8.363.8] 



sin(^) — z TJ I 1 — - 

we can rewrite © as 

K (a)w\ r(f ) sin (ff) 
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n2 ~{\ ( 2 J - 1 - x ) 2 ( 2 J - 1 + X Y' 
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1 exp (-7e(a - 1)) : 



1 + ^J 



we can rewrite 

kg"(t) = - K 2 ^ ( — : + ■ 



^ V(2j - 1 - nt/k) 2 (2j - 1 + Kt/k) 2 J t 2 



-^E 
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To show its monotonicity, it suffices to show for any s > 1 



i - 



A'.s 



fc 2 S 2 



■s- k 2 



decreases monotonically, which is equivalent to show the 
monotonicity of g(t) with increasing t, for t > 2, where 



2 ^ (( Q i/2fc) 2 - f 2 ) 2 £j (at/k+j) 2 £j (1 - t/k+j) 2 

<t( I + I 

V(2j - 1 - Kt/k) 2 (2j -l + K t/k) 2 

-« 2 E 



g(t)=t log 1 + 
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/ 2 



: tloe 



It is straightforward to show that tlog (fzf) i s monotoni- 
cally decreasing with increasing t (t > 2), for a < 1. 

To this end, we have proved that for < a < 2 (a ^ 1), 



^V("t/fe-2j) 2 (at/k + 2j) 2 

+ a 2 V + V " ■ 
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l Ef a < 1, i.e., = a, then 

fc9 "(t) = - a 2 jr ( - + )+ a 2 jr- ' 



£j V (at/* - j) 2 (at/k + ]) 2 J fr{ (at/k + j) 
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2k J tt \k J V ft/ V 2 fc 
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monotonically with increasing fc (fc > 2). 

D Proof of Lemma ID 

We first find the constant G_R jgm in the right tail bound 



fri U - t/k) 2 ' £l (j - Q t/fe) 2 ^ (j - t/fc) 2 

because a < 1 and < t < fc. 

If a > 1, i.e., ft(a) = 2 — a < 1, then 

00 / l l 
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f=i V (2j - 1 - Kt/fe) 2 (2j - 1 + Kt/k) 2 
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For < t < k, the Markov moment bound yields 



fri (at/k + 2j - l) 2 ^ (2 3 - t/k) 2 f^i (2j - 1 - t/k) 2 
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^ (2j - 1 + Kt/fe) 2 ^ (2j - at/k) 2 



( 1 + £ ) tF ( Q ) 

_ [c°s (^t) j r (g) r (1 - |) sin 
(1 + e)* exp(-t7 e (a - 1)) 

We need to find the t that minimizes the upper bound. For 
convenience, we consider its logarithm, i.e., 

g(t) = i 7e (q - 1) - t log(l + e) + 

/ / n(a)TT \ 2 f at\ ( t\ ( irat 
fclog ( cos ( _Li_ t j _r (_j r (l - -) sin (_ 

whose first and second derivatives (with respect to t) are 

g'(t) = 7 c(a - 1) - log(l + c) - ^--tan ( -^-t ) + ■ 



+ « 2 E 



+E75737 
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> 0, (because a > re). 
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Since we have proved that g"(t), i.e., g(t) is a convex 
function, one can find the optimal t by solving g'(t) = 0: 

, i\ 1 /1 , 1 re (a)f. / «(a)7r \ qtt/2 
7c (a - 1) - log(l + e) - ^tan (^*J + — ^ + 

+ *(t) oi -*( 1 -s) =0 ' 

We let the solution be < = CTrA;, where is the solution to 

K(a)-K f K(a)n \ a-rv/2 

7e (a - 1 ) - log 1 + e) —— tan v ^ C H H j-J- r 

y ' M 7 2 V 2 J V tan(^Cfi) 

+ i/j (aCa) a - ip (1 - C R ) = 0. 



Alternatively, we can seek a "sub-optimal" (but asymp- 
totically optimal) solution using the asymptotic expression 

for E ^-F( a ) , gm ) m Lemma[3j i.e., the t that minimizes 

(1 + 6)"* oxp ( ~ (t 2 - t) (2 + a 2 - 3k 2 (a)) 



whose minimum is attained at 



t = k- 



log(l + e) 



(2 + a 2 - 3f£ 2 (a))7r 2 /12 2 



This approximation can be useful (e.g.,) for serving the ini- 
tial guess for Cr in a numerical procedure. 

Assume we know Cr (e.g., by a numerical procedure), 
we can then express the right tail bound as 



Pr IF, 



F(„) > e^fc) ) < exp -k 
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Next, we find the constant Gl qm in the left tail bound 
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From Lemma [3] we know that, for any t, where < t < 
k/aifa > 1 and t > Oif a < 1, 
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whose minimum is attained at t = C^k (we skip the proof 
of convexity) such that 

k(ol)-k f K(aW \ air f air 
log(l - e) - 7 e(a- 1) - tan ( -i-^-CtJ + —tan ( — Cx, 

- iP(aC L )a + TP(C L ) =0. 

Thus, we show the left tail bound 
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E Proof of Lemma |5] 

First, we consider the right bound. From Lemma|4] 

« 2 
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and Cr is the solution to <7i(Cr, a, e) = 0, 

ffi(C R ,a,e) = -7c (a - 1) + log(l + e) + -^^tan I ^-J—Cr 
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tan (^Cr) 

Using series representations in [7, 1.421.1,1.421.3,8.362.1] 
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From LemmalU we know gi — has a unique well-defined 
solution for Cr € (0,1). We need to analyze this term 
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which, when a — > 1 (i.e., k — > 1), must approach a finite Also 

limit. In other words, Cr — ► 1, at the rate O { V~A] , i.e., , (i + aC R )(i - C R ) , l + aC R , l - kCh 

1 - k 2 C 2 1 + kCr 1 - Cr 



1 + y V l-c 



By Euler'r reflection formula and series representations, 

cos (£2£fi.) r(l - aC s ) 



= - y/A log(l + e) + o(VA) . 

and 



1 V «» r (i - c«) / £ i og A gjffl = £ log 1 + ^ + lor 1 " w 



-^, 1 + ^| +H .-^,-0 M . 



3 = 1 



ox P ( 7c (a - i)c„) 1 ~ Cfl nfi- - Cj? K Cr ] Therefore, for a > 1, we also have 

V ™ ' ^ 1 - <xCr ^ (2j + l) 2 A (2j + l) 2 J 



^ ::p ( ('-°) C M ft + i— ^) (l 1 1 -" C " )' 1 ^— = Iog(l + e) - 2 V /Alcg (1 + c) + o (VA) . 



3=1 



In other words, as a — > 1, the constant Gn gm converges to 



;OXp(7c(Q - 1)Ci?) T^T^cI ^ (27+T)2 J Egfr+i) at the rate O (/A 

k 2 c 2 ^ _1 / (1 - a)c H \ / + i-c n \ ( + 1 - aC R \ - 1 Next, we consider the left bound. From Lemma 0] 



(2j + v V i A i A j / 



Pr (F (Q) , sm , b - F (a) < -eF (a) < oxp -fc 



taking logarithm of which yields ^ ' y \ G tijm 



, cos^jril-o^) (HoChKI-Ch) 
log t = 7 e (a - l)C fl + lor 



where 



/«iC H \ r ,. „ , ' " b 1 - k 2 C 2 e 2 

(.-T^J T(l - C fl ) = -Cz, log(l - 6) + C £7 e(a-1) + log a 

K(a)7T \ \ / f ttcxCl 



+ j2 iog ) (2 ; +1 2 ( + ( (l ~ a)Cfl ^ + iog ; \ ■ ~ log [ cos { 2 ' ' I r 1 ' ''- 5 1 + '• " 1 1; 1 ' 1 ' 1 : ' • " 



0>s+iy 

If a < 1, i.e., k = a = 1 — A, then 

cos(^^i)r(l- Q C H ) 



,(s°a) r(i-c R ) 



and Cl is the solution to giiChi o-, e) = 0, 

K(a)7T / K(a)7T 

§2{C L ,a, e) = log(l - e) - 7 e (a - 1) — tan I — - — ( 

+ —tan — C L - tp ("Cl) a + ip (C L ) = 0. 



_ ACn + ion 1 C ' H + f C 1 a )C-R \ , _v j / Using series representations, we rewrite 02 as 

I-cCr >■ I I- ■ ^-Co'i 



3=1 



7e Ac R -i og (i + T ^-) + 5:i(i^^) -i(i^) ... " " " 2 - tzw-ir-wr.)' 



K7r 4kCl 1 
l(l-aC R \ 2 1/1-Cn\ 2 S2 = -7e(«- 1) + log(l -e) - — — ■ 



1-CrJ f^2 



air 4oCl ^ 1 

= - 7 eACfl - log ( 1 + + ^-C R A(2 - aC R - Cr) + ... + "2 ^~ ^ (2j - l) 2 - («Ci) 2 

1 — Ga / 12 



1 1 

Thus, for a < 1, as c* = l - y^i^ + ° (VS), we obtain - « 1-7. - ^ + (aCx.) Xj -^r 



!- C « 12 I 7 C L + L ^ j{C L +j) 



= log(l + e) - 2y/A log(l + e) + o (Va) 



J ^~ 1 



fr{ (2j - l) 2 - ( K C L ) 2 " ^ (2j - l) 2 - (aCz,) 2 

If a > 1, i.e., a = 1 + A and k = 1 — A, then , °° i 00 i 



cos 

lo. 



cos(^a)r(l-C Ji ) =log(l-e)-K^ 



rj \ 2j - 1 - kCl 2] - 1 + kC l 



A „ (1 + aCR)(l - Cr) " , V 1 Qj7TT?V 1 1 + (1 _ a)c , y ggt+j(l + °0 

, 7c AC R + log 1 _ b2c .2 +2>g-7^ ^W-l-aCL 2j - 1 + aCl + U a, ° £ ^ jCaCx +j)(C i + j) 



We first consider a = 1 + A > 1. In order for 52 = to 
have a meaningful solution, we must make sure that 

-re a 2A 2A 



1-reCi 1-aCt, (1- «Cx,)(l -aC t ) 1 - 2C L + C£ - A 2 C£ 

converges to a finite value as a — ► 1, i.e., Cl — > 1 also. This 
provides an approximate solution for Cl when a > 1: 



F Proof of Lemma H 

Assume fe i.i.d. samples ~ S(a < l,/3 = 1, F( a )). Using 
the (— a)th moment in Lemma[T]suggests that 



2A 



-0 Va) 



-log(l - e) 

Using series representations, we obtain 

r(aCi)cos(^i) 



r(i+o) 

is an unbiased estimator of d7 %, whose variance is 



Var 



Ci7e(" — 1) + log a + log ■ 



(«)' 

(o T) /2r 2 (l + a) 



-c L J r(c L 

~ 2 ~2 



1 + 



= 1 ° g JQL 



'ACi 



n 



= E 



ACi 



ACi 



+ O(A) + lO; 



1 - c^C| 

1 - K 2 C? 



T(l + 2a) 

We can then estimate F/ a \ by , i.e., 



+ ^log 



K 2c? 



(2.+1) 



= - A /-2A log(l - e) + O (A) . 

Therefore, for a > 1 



- log(l - e) - 2 v '-2Alog(l - e) + o (%/a) 

Finally, we need to consider a < 1. In this case, 

aC L +3(1 + a) 



g 2 =log(l - e) + AC £ ^ 



= log(l-e) + AC L 



£rl 3{aC! +j)(Ci +3) 

1 ^2, 1 



E 



■o(A). 



Using properties of Riemann's Zeta function and Bernoulli 
numbers|7 9.511,9.521.1,9.61] 



E 



(i + c L ) 2 
1 /■«> 

Wo 



1 r 00 t. 
cI + ./o T 



t exp(— Cit) 
■ cxp( — t) 



dl 



H 1 h ... exp(-C L t)dt = 



C L 



Using the integral relation! 7., 0.244.1] and treating Cl as a 
positive integer (which does not affect the asymptotics) 

^2, 1 1 r 1 1 - t c L 
> = — / dt 

f^jU + CL) C L J a l-t 

= — f 1 t c ^-^ + t° L ~ 2 + ... + ldt 
Cl Jo 

=^Ej = ^(7e + lo g C, +0 (c-)). 

Thus, the solution to g-2 = can be approximated by 

log(l - e) + A (1 + 7c + logCi) + o(A) = 0. 

Again, using series representations, we obtain 

T(aC L ) 



Ci7e( a — 1) + log a + log 



=io g n 



i + 



- exp 



=E 



;=! i + • 

ACi ACi 



r(Ci) 

AC, 
3 



+ ... 



fri\j + c L j 

= -AC L (7= + logCi) + ... 

Combining the results, we obtain, when a < 1 and A — > 0, 



,which is biased at the order O (^). To remove the O (r) 
term of the bias, we recommend a bias-corrected version ob- 
tained by Taylor expansions ifTJl Theorem 6.1.1]: 

1 Var(fl (cv) ) 



from which we obtain the bias-corrected estimator 

1 / 2r*(l + a) 



whose bias and variance are 



T(l + 2a) 



E ( Ft, 



Var Ft, 



F {a) + O 

r(l + 2a) 



We now study the tail bounds. For convenience, we pro- 
vide tail bounds for F( a ),hm instead of Ft a ),hm,c- We first 
analyze the following moment generating function: 

cos (aw/2) /r(l + a) 



E I cxp 



= 1 +E^ E (^) 
=i+E 



Igl- 



oos (ait /2) /r(l + a) 
t m r ^ + m )r m (l + a) 

= z_ 





T(l + ma) 

For the right tail bound, 



J2, !""(! + «) 
^ r(l + ma) 



Pr F r , 



> (i + €)F (o 



— Pr I cxp 



£? = i^)i'ir° 

cos (cat /2) /r(l + a) 



> cxp 



(1 + e) 



(t > 0) 



^ ' T(l + ma) 



cxp I £ 



(i + e) 



_ fc (_io g ( y r»(i + a) y 



1 + e 



where t\ is the solution to 

£~ =1 (-ir™(*ir 



-i r m (i+ a ) 



S2m = o( _1 )" 1 (*l)" 1 r(l + ma) 



1 + e 



For the left tail bound, 



Pr F, 



(cO.hn 



< -eF, 



(a). 



-Pr I _.,^ TTSr <(l-e)F (a) 



=Pr cxp t 



£}=if(a)l*i|- 



< / ~ r^l + a) 
" V^o r ( 1 + mQ ) 

cxp [ — fe 



cos (air/ 2) /T(l + a) 
exp 



> exp 



where ^ is the solution to 



^ r(l + ma) 

jtion t< 

r m -'(i + a) 



1 - £ 



T(l + (m - l)a) 



(1-e) 



(gr^(l + a) 
r(l + ma) 



= 
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